12 research outputs found

    Comprehensive Review of Opinion Summarization

    Get PDF
    The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe

    Question Processing and Clustering in INDOC: A Biomedical Question Answering System

    Get PDF
    The exponential growth in the volume of publications in the biomedical domain has made it impossible for an individual to keep pace with the advances. Even though evidence-based medicine has gained wide acceptance, the physicians are unable to access the relevant information in the required time, leaving most of the questions unanswered. This accentuates the need for fast and accurate biomedical question answering systems. In this paper we introduce INDOC—a biomedical question answering system based on novel ideas of indexing and extracting the answer to the questions posed. INDOC displays the results in clusters to help the user arrive the most relevant set of documents quickly. Evaluation was done against the standard OHSUMED test collection. Our system achieves high accuracy and minimizes user effort

    Autonomous agents for serving complex information needs

    Get PDF
    Over the past few decades two prominent paradigms for information seeking in the form of search engines and recommendation systems have been developed. However neither of these is well suited to serve queries representing complex information needs (eg. medical case-based queries). As a result users increasingly turn to web communities such as HealthBoards and Yahoo! Answers making them extremely popular. However, not all queries posted there receive informative answers or are answered in a timely manner. In this work we present a novel paradigm for information service in which autonomous agents help dissatisfied users in web communities by proactively posting responses to their unresolved queries. The main contribution of this work is to concretely define three application tasks based on this paradigm in the healthcare domain, and show that it is indeed feasible to develop agents capable of generating meaningful responses with a high accuracy. The first task involved designing an agent for resolving physician case-based queries using literature data. We addressed the problem via methods that utilized available biomedical semantic resources and showed that a precision at 10 of upto 0.48 could be achieved. The second study involved resolving layperson queries on web forums by finding similar discussion threads. This task was more challenging due to noisy nature of forum data and unsuitability of existing semantic resources. We developed novel shallow semantic information extraction techniques for the problem, and our methods utilized them to achieve a best precision at 5 of 0.54. Finally the third task was to design an autonomous agent for resolving general healthcare questions on community question answering (cQA) websites. This task required more detailed semantic information in the form of a database containing precise medical entities, verbose text descriptions, and the relations between them. These were obtained by using health information websites as an information source. We proposed a principled probabilistic model for the problem, and it was found to resolve over 30% of the questions correctly. Overall our results clearly suggest that autonomous agents are not only feasible, but can also deliver considerable value to both expert and layperson users of web forums and cQA websites. We believe such autonomous agents have great potential and our work opens up an exciting new area of research

    Mining Semi-Structured Online Knowledge Bases to Answer Natural Language Questions on Community QA Websites

    No full text
    ABSTRACT Over the past few years, community QA websites (e.g. Yahoo! Answers) have become a useful platform for users to post questions and obtain answers. However, not all questions posted there receive informative answers or are answered in a timely manner. In this paper, we show that the answers to some of these questions are available in online domain-specific knowledge bases and propose an approach to automatically discover those answers. In the proposed approach, we would first mine appropriate SQL query patterns by leveraging an existing collection of QA pairs, and then use the learned query patterns to answer previously unseen questions by returning relevant entities from the knowledge base. Evaluation on a collection of health domain questions from Yahoo! Answers shows that the proposed method is effective in discovering potential answers to user questions from an online medical knowledge base

    Reliability Prediction of Webpages in the Medical Domain

    No full text
    Abstract. In this paper, we study how to automatically predict reliability of web pages in the medical domain. Assessing reliability of online medical information is especially critical as it may potentially influence vulnerable patients seeking help online. Unfortunately, there are no automated systems currently available that can classify a medical webpage as being reliable, while manual assessment cannot scale up to process the large number of medical pages on the Web. We propose a supervised learning approach to automatically predict reliability of medical webpages. We developed a gold standard dataset using the standard reliability criteria defined by the Health on Net Foundation and systematically experimented with different link and content based feature sets. Our experiments show promising results with prediction accuracies of over 80%. We also show that our proposed prediction method is useful in applications such as reliability-based re-ranking and automatic website accreditation.
    corecore